AITopics | reference frame

Collaborating Authors

reference frame

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EVER: Edge-Assisted Auto-Verification for Mobile MR-Aided Operation

Chen, Jiangong, Zhu, Mingyu, Li, Bin

arXiv.org Artificial IntelligenceDec-5-2025

Mixed Reality (MR)-aided operation overlays digital objects on the physical world to provide a more immersive and intuitive operation process. A primary challenge is the precise and fast auto-verification of whether the user follows MR guidance by comparing frames before and after each operation. The pre-operation frame includes virtual guiding objects, while the post-operation frame contains physical counterparts. Existing approaches fall short of accounting for the discrepancies between physical and virtual objects due to imperfect 3D modeling or lighting estimation. In this paper, we propose EVER: an edge-assisted auto-verification system for mobile MR-aided operations. Unlike traditional frame-based similarity comparisons, EVER leverages the segmentation model and rendering pipeline adapted to the unique attributes of frames with physical pieces and those with their virtual counterparts; it adopts a threshold-based strategy using Intersection over Union (IoU) metrics for accurate auto-verification. To ensure fast auto-verification and low energy consumption, EVER offloads compute-intensive tasks to an edge server. Through comprehensive evaluations of public datasets and custom datasets with practical implementation, EVER achieves over 90% verification accuracy within 100 milliseconds (significantly faster than average human reaction time of approximately 273 milliseconds), while consuming only minimal additional computational resources and energy compared to a system without auto-verification.

artificial intelligence, machine learning, target frame, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ISMAR67309.2025.00148

2510.18224

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (1.00)
Energy (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(5 more...)

Add feedback

A Neurosymbolic Framework for Interpretable Cognitive Attack Detection in Augmented Reality

Chen, Rongqian, Andreyev, Allison, Xiu, Yanming, Chilukuri, Joshua, Sen, Shunav, Imani, Mahdi, Li, Bin, Gorlatova, Maria, Tan, Gang, Lan, Tian

arXiv.org Artificial IntelligenceDec-1-2025

Augmented Reality (AR) enriches human perception by overlaying virtual elements onto the physical world. However, this tight coupling between virtual and real content makes AR vulnerable to cognitive attacks: manipulations that distort users' semantic understanding of the environment. Existing detection methods largely focus on visual inconsistencies at the pixel or image level, offering limited semantic reasoning or interpretability. To address these limitations, we introduce CADAR, a neuro-symbolic framework for cognitive attack detection in AR that integrates neural and symbolic reasoning. CADAR fuses multimodal vision-language representations from pre-trained models into a perception graph that captures objects, relations, and temporal contextual salience. Building on this structure, a particle-filter-based statistical reasoning module infers anomalies in semantic dynamics to reveal cognitive attacks. This combination provides both the adaptability of modern vision-language models and the interpretability of probabilistic symbolic reasoning. Preliminary experiments on an AR cognitive-attack dataset demonstrate consistent advantages over existing approaches, highlighting the potential of neuro-symbolic methods for robust and interpretable AR security.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2508.09185

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > California > Orange County > Anaheim (0.04)

Genre:

Overview (0.66)
Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Geometrically-Constrained Agent for Spatial Reasoning

Chen, Zeren, Lu, Xiaoya, Zheng, Zhijie, Li, Pengrui, He, Lehan, Zhou, Yijin, Shao, Jing, Zhuang, Bohan, Sheng, Lu

arXiv.org Artificial IntelligenceDec-1-2025

Vision Language Models (VLMs) exhibit a fundamental semantic-to-geometric gap in spatial reasoning: they excel at qualitative semantic inference but their reasoning operates within a lossy semantic space, misaligned with high-fidelity geometry. Current paradigms fail to bridge this gap. Training-based methods suffer from an ``oracle paradox,'' learning flawed spatial logic from imperfect oracles. Tool-integrated methods constrain the final computation but critically leave the VLM's planning process unconstrained, resulting in geometrically flawed plans. In this work, we propose Geometrically-Constrained Agent (GCA), a training-free agentic paradigm that resolves this gap by introducing a formal task constraint. Specifically, we strategically decouples the VLM's role into two stages. First, acting as a semantic analyst, the VLM translates the user's ambiguous query into the formal, verifiable task constraint, which defines the reference frame and objective. Second, acting as a task solver, the VLM generates and executes tool calls strictly within the deterministic bounds defined by the constraint. This geometrically-constrained reasoning strategy successfully resolve the semantic-to-geometric gap, yielding a robust and verifiable reasoning pathway for spatial reasoning. Comprehensive experiments demonstrate that GCA achieves SOTA performance on multiple spatial reasoning benchmarks, surpassing existing training-based and tool-integrated methods by ~27%. Please see our homepage at https://gca-spatial-reasoning.github.io.

artificial intelligence, machine learning, spatial reasoning, (18 more...)

arXiv.org Artificial Intelligence

2511.22659

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Heterogeneous Stroke: Using Unique Vibration Cues to Improve the Wrist-Worn Spatiotemporal Tactile Display

Kim, Taejun, Shim, Youngbo Aram, Lee, Geehyuk

arXiv.org Artificial IntelligenceNov-27-2025

Beyond a simple notification of incoming calls or messages, more complex information such as alphabets and digits can be delivered through spatiotemporal tactile patterns (STPs) on a wrist-worn tactile display (WTD) with multiple tactors. However, owing to the limited skin area and spatial acuity of the wrist, frequent confusions occur between closely located tactors, resulting in a low recognition accuracy. Furthermore, the accuracies reported in previous studies have mostly been measured for a specific posture and could further decrease with free arm postures in real life. Herein, we present Heterogeneous Stroke, a design concept for improving the recognition accuracy of STPs on a WTD. By assigning unique vibrotactile stimuli to each tactor, the confusion between tactors can be reduced. Through our implementation of Heterogeneous Stroke, the alphanumeric characters could be delivered with high accuracy (93.8% for 26 alphabets and 92.4% for 10 digits) across different arm postures.

accuracy, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3411764.3445448

2511.16133

Country:

Asia > South Korea > Daejeon > Daejeon (0.40)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York > New York County > New York City (0.05)
(14 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.70)
Information Technology > Hardware (0.68)

Add feedback

A Target-based Multi-LiDAR Multi-Camera Extrinsic Calibration System

Gentilini, Lorenzo, Serio, Pierpaolo, Donzella, Valentina, Pollini, Lorenzo

arXiv.org Artificial IntelligenceNov-25-2025

Extrinsic Calibration represents the cornerstone of autonomous driving. Its accuracy plays a crucial role in the perception pipeline, as any errors can have implications for the safety of the vehicle. Modern sensor systems collect different types of data from the environment, making it harder to align the data. To this end, we propose a target-based extrinsic calibration system tailored for a multi-LiDAR and multi-camera sensor suite. This system enables cross-calibration between LiDARs and cameras with limited prior knowledge using a custom ChArUco board and a tailored nonlinear optimization method. We test the system with real-world data gathered in a warehouse. Results demonstrated the effectiveness of the proposed method, highlighting the feasibility of a unique pipeline tailored for various types of sensors.

artificial intelligence, calibration, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2507.16621

Country:

North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report (0.64)

Industry:

Automobiles & Trucks (0.48)
Transportation (0.34)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.50)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.49)

Add feedback

Object-centric Task Representation and Transfer using Diffused Orientation Fields

Bilaloglu, Cem, Löw, Tobias, Calinon, Sylvain

arXiv.org Artificial IntelligenceNov-25-2025

Curved objects pose a fundamental challenge for skill transfer in robotics: unlike planar surfaces, they do not admit a global reference frame. As a result, task-relevant directions such as "toward" or "along" the surface vary with position and geometry, making object-centric tasks difficult to transfer across shapes. To address this, we introduce an approach using Diffused Orientation Fields (DOF), a smooth representation of local reference frames, for transfer learning of tasks across curved objects. By expressing manipulation tasks in these smoothly varying local frames, we reduce the problem of transferring tasks across curved objects to establishing sparse keypoint correspondences. DOF is computed online from raw point cloud data using diffusion processes governed by partial differential equations, conditioned on keypoints. We evaluate DOF under geometric, topological, and localization perturbations, and demonstrate successful transfer of tasks requiring continuous physical interaction such as inspection, slicing, and peeling across varied objects. We provide our open-source codes at our website https://github.com/idiap/diffused_fields_robotics

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.18563

Country:

Europe > United Kingdom > North Sea > Southern North Sea (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > Japan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Add feedback

StreamDiT: Real-Time Streaming Text-to-Video Generation

Kodaira, Akio, Hou, Tingbo, Hou, Ji, Georgopoulos, Markos, Juefei-Xu, Felix, Tomizuka, Masayoshi, Zhao, Yue

arXiv.org Artificial IntelligenceNov-17-2025

Recently, great progress has been achieved in text-to-video (T2V) generation by scaling transformer-based diffusion models to billions of parameters, which can generate high-quality videos. However, existing models typically produce only short clips offline, restricting their use cases in interactive and real-time applications. This paper addresses these challenges by proposing StreamDiT, a streaming video generation model. StreamDiT training is based on flow matching by adding a moving buffer. We design mixed training with different partitioning schemes of buffered frames to boost both content consistency and visual quality. StreamDiT modeling is based on adaLN DiT with varying time embedding and window attention. To practice the proposed method, we train a StreamDiT model with 4B parameters. In addition, we propose a multistep distillation method tailored for StreamDiT. Sampling distillation is performed in each segment of a chosen partitioning scheme. After distillation, the total number of function evaluations (NFEs) is reduced to the number of chunks in a buffer. Finally, our distilled model reaches real-time performance at 16 FPS on one GPU, which can generate video streams at 512p resolution. We evaluate our method through both quantitative metrics and human evaluation. Our model enables real-time applications, e.g. streaming generation, interactive generation, and video-to-video. We provide video results and more examples in our project website: https://cumulo-autumn.github.io/StreamDiT/

machine learning, natural language, real time system, (21 more...)

arXiv.org Artificial Intelligence

2507.03745

Genre: